220 research outputs found

    PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts

    Full text link
    Public disclosure of important security information, such as knowledge of vulnerabilities or exploits, often occurs in blogs, tweets, mailing lists, and other online sources months before proper classification into structured databases. In order to facilitate timely discovery of such knowledge, we propose a novel semi-supervised learning algorithm, PACE, for identifying and classifying relevant entities in text sources. The main contribution of this paper is an enhancement of the traditional bootstrapping method for entity extraction by employing a time-memory trade-off that simultaneously circumvents a costly corpus search while strengthening pattern nomination, which should increase accuracy. An implementation in the cyber-security domain is discussed as well as challenges to Natural Language Processing imposed by the security domain.Comment: 6 pages, 3 figures, ieeeTran conference. International Conference on Machine Learning and Applications 201

    Balancing Interactive Data Management of Massive Data with Situational Awareness through Smart Aggregation

    Full text link
    Designing a visualization system capable of processing, managing, and presenting massive data sets while maximizing the user’s situational awareness (SA) is a challenging, but important, research question in visual analytics. Traditional data management and interactive retrieval approaches have often focused on solving the data overload problem at the expense of the user’s SA. This paper discusses various data management strategies and the strengths and limitations of each approach in providing the user with SA. A new data management strategy, coined Smart Aggregation, is presented as a powerful approach to overcome the challenges of both massive data sets and maintaining SA. By combining automatic data aggregation with user-defined controls on what, how, and when data should be aggregated, we present a visualization system that can handle massive amounts of data while affording the user with the best possible SA. This approach ensures that a system is always usable in terms of both system resources and human perceptual resources. We have implemented our Smart Aggregation approach in a visual analytics system called VIAssist (Visual Assistant for Information Assurance Analysis) to facilitate exploration, discovery, and SA in th

    The Influence of Visual Provenance Representations on Strategies in a Collaborative Hand-off Data Analysis Scenario

    Full text link
    Conducting data analysis tasks rarely occur in isolation. Especially in intelligence analysis scenarios where different experts contribute knowledge to a shared understanding, members must communicate how insights develop to establish common ground among collaborators. The use of provenance to communicate analytic sensemaking carries promise by describing the interactions and summarizing the steps taken to reach insights. Yet, no universal guidelines exist for communicating provenance in different settings. Our work focuses on the presentation of provenance information and the resulting conclusions reached and strategies used by new analysts. In an open-ended, 30-minute, textual exploration scenario, we qualitatively compare how adding different types of provenance information (specifically data coverage and interaction history) affects analysts' confidence in conclusions developed, propensity to repeat work, filtering of data, identification of relevant information, and typical investigation strategies. We see that data coverage (i.e., what was interacted with) provides provenance information without limiting individual investigation freedom. On the other hand, while interaction history (i.e., when something was interacted with) does not significantly encourage more mimicry, it does take more time to comfortably understand, as represented by less confident conclusions and less relevant information-gathering behaviors. Our results contribute empirical data towards understanding how provenance summarizations can influence analysis behaviors.Comment: to be published in IEEE Vis 202

    Global analysis of the mammalian RNA degradome reveals widespread miRNA-dependent and miRNA-independent endonucleolytic cleavage

    Get PDF
    The Ago2 component of the RNA-induced silencing complex (RISC) is an endonuclease that cleaves mRNAs that base pair with high complementarity to RISC-bound microRNAs. Many examples of such direct cleavage have been identified in plants, but not in vertebrates, despite the conservation of catalytic capacity in vertebrate Ago2. We performed parallel analysis of RNA ends (PAREs), a deep sequencing approach that identifies 5′-phosphorylated, polyadenylated RNAs, to detect potential microRNA-directed mRNA cleavages in mouse embryo and adult tissues. We found that numerous mRNAs are potentially targeted for cleavage by endogenous microRNAs, but at very low levels relative to the mRNA abundance, apart from miR-151-5p-guided cleavage of the N4BP1 mRNA. We also find numerous examples of non-miRNA-directed cleavage, including cleavage of a group of mRNAs within a CA-repeat consensus sequence. The PARE analysis also identified many examples of adenylated small non-coding RNAs, including microRNAs, tRNA processing intermediates and various other small RNAs, consistent with adenylation being part of a widespread proof-reading and/or degradation pathway for small RNAs

    Identification of extracellular glycerophosphodiesterases in Pseudomonas and their role in soil organic phosphorus remineralisation

    Get PDF
    In soils, phosphorus (P) exists in numerous organic and inorganic forms. However, plants can only acquire inorganic orthophosphate (Pi), meaning global crop production is frequently limited by P availability. To overcome this problem, rock phosphate fertilisers are heavily applied, often with negative environmental and socio-economic consequences. The organic P fraction of soil contains phospholipids that are rapidly degraded resulting in the release of bioavailable Pi. However, the mechanisms behind this process remain unknown. We identified and experimentally confirmed the function of two secreted glycerolphosphodiesterases, GlpQI and GlpQII, found in Pseudomonas stutzeri DSM4166 and Pseudomonas fluorescens SBW25, respectively. A series of co-cultivation experiments revealed that in these Pseudomonas strains, cleavage of glycerolphosphorylcholine and its breakdown product G3P occurs extracellularly allowing other bacteria to benefit from this metabolism. Analyses of metagenomic and metatranscriptomic datasets revealed that this trait is widespread among soil bacteria with Actinobacteria and Proteobacteria, specifically Betaproteobacteria and Gammaproteobacteria, the likely major players
    corecore